-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3414][SQL] Replace LowerCaseSchema with Resolver #2382
Conversation
/cc @liancheng |
/cc @ericl |
QA tests have started for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
QA tests have started for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
QA tests have started for PR 2382 at commit
|
QA tests have started for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
@@ -22,4 +22,9 @@ package org.apache.spark.sql.catalyst | |||
* Analysis consists of translating [[UnresolvedAttribute]]s and [[UnresolvedRelation]]s | |||
* into fully typed objects using information in a schema [[Catalog]]. | |||
*/ | |||
package object analysis | |||
package object analysis { | |||
type Resolver = (String, String) => Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolver
probably a general name, can we use a more precise name for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will actually end up providing more general resolution functionality in the long term. I've added some scala doc for clarity though.
LGTM except some minor issues mentioned in the comments :) |
Oh, one more thing, please help rename this test case: spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala Line 525 in 0a7091e
|
QA tests have started for PR 2382 at commit
|
…sensitive resolution is still case preserving.
QA tests have started for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
QA tests have finished for PR 2382 at commit
|
(nestedFields, expression.dataType) match { | ||
case (Nil, _) => expression | ||
case (requestedField :: rest, StructType(fields)) => | ||
val actualField = fields.filter(f => resolver(f.name, requestedField)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem here. Currently a.b[1].c.d
will be parsed as GetField(GetField(GetItem(Unresolved("a.b"), 1), "c"), "d")
, so the case-sensitive-check only happens when resolve Unresolved("a.b")
to GetField(Attribute("a"), "b")
. Something like "SELECT a[0].A.A from nested" will fail for hql on case-sensitive-check.
I think we should also do this check in GetField
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. Right now you can't make a SQLContext case insensitive, but when you can this will be problem. Maybe you should note this on SPARK-3617
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, sorry... Is that how the HiveQL parser will do it too? I'm not a huge fan of moving resolution logic into the expressions. What about a rule that only ran in case insensitive mode that fixes unresolved GetFields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this bug exists in HiveQL. I have opened a PR to fix this(adding a rule to fix unresolved GetField
s).#2543 Need your comments :)
This PR introduces a subtle change in semantics for HiveContext when using the results in Python or Scala. Specifically, while resolution remains case insensitive, it is now case preserving.
This PR is a follow up to #2293 (and to a lesser extent #2262 #2334).
In #2293 the catalog was changed to store analyzed logical plans instead of unresolved ones. While this change fixed the reported bug (which was caused by yet another instance of us forgetting to put in a
LowerCaseSchema
operator) it had the consequence of breaking assumptions made byMultiInstanceRelation
. Specifically, we can't replace swap out leaf operators in a tree without rewriting changed expression ids (which happens when you self join the same RDD that has been registered as a temp table).In this PR, I instead remove the need to insert
LowerCaseSchema
operators at all, by moving the concern of matching up identifiers completely into analysis. Doing so allows the test cases from both #2293 and #2262 to pass at the same time (and likely fixes a slew of other "unknown unknown" bugs).While it is rolled back in this PR, storing the analyzed plan might actually be a good idea. For instance, it is kind of confusing if you register a temporary table, change the case sensitivity of resolution and now you can't query that table anymore. This can be addressed in a follow up PR.
Follow-ups: